Trainable High Resolution Melt Curve Machine Learning Classifier for Large-Scale Reliable Genotyping of Sequence Variants

نویسندگان

  • Pornpat Athamanolap
  • Vishwa Parekh
  • Stephanie I. Fraley
  • Vatsal Agarwal
  • Dong J. Shin
  • Michael A. Jacobs
  • Tza-Huei Wang
  • Samuel Yang
  • John Z. Metcalfe
چکیده

High resolution melt (HRM) is gaining considerable popularity as a simple and robust method for genotyping sequence variants. However, accurate genotyping of an unknown sample for which a large number of possible variants may exist will require an automated HRM curve identification method capable of comparing unknowns against a large cohort of known sequence variants. Herein, we describe a new method for automated HRM curve classification based on machine learning methods and learned tolerance for reaction condition deviations. We tested this method in silico through multiple cross-validations using curves generated from 9 different simulated experimental conditions to classify 92 known serotypes of Streptococcus pneumoniae and demonstrated over 99% accuracy with 8 training curves per serotype. In vitro verification of the algorithm was tested using sequence variants of a cancer-related gene and demonstrated 100% accuracy with 3 training curves per sequence variant. The machine learning algorithm enabled reliable, scalable, and automated HRM genotyping analysis with broad potential clinical and epidemiological applications.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Nested Machine Learning Facilitates Increased Sequence Content for Large-Scale Automated High Resolution Melt Genotyping

High Resolution Melt (HRM) is a versatile and rapid post-PCR DNA analysis technique primarily used to differentiate sequence variants among only a few short amplicons. We recently developed a one-vs-one support vector machine algorithm (OVO SVM) that enables the use of HRM for identifying numerous short amplicon sequences automatically and reliably. Herein, we set out to maximize the discrimina...

متن کامل

Tú NGUYEN-DUMONT Study of differential allelic expression in the breast cancer intermediate-risk susceptibility genes

Mutation scanning using high-resolution melting curve analysis (HR-melt) is an effective and sensitive method to detect sequence variations. However, the presence of a common SNP within a mutation scanning amplicon may considerably complicate the interpretation of results and increase the number of samples flagged for sequencing by interfering with the clustering of samples according to melting...

متن کامل

Detection and discrimination of two Brucella species by multiplex real-time PCR and high-resolution melt analysis curve from human blood and comparison of results using RFLP

Objective(s): Rapid and accurate detection of Brucella abortus and Brucella melitensis from clinical samples is so important because antibiotic treatment has major side effects. This study reveals a new method in detection of clinical samples of brucellosis using real-time PCR and high-resolution melt (HRM) curve analysis. Materials and Methods: 160 brucellosis suspicious samples with more tha...

متن کامل

High-resolution DNA melt curve analysis of the clustered, regularly interspaced short-palindromic-repeat locus of Campylobacter jejuni.

A novel method for genotyping the clustered, regularly interspaced short-palindromic-repeat (CRISPR) locus of Campylobacter jejuni is described. Following real-time PCR, CRISPR products were subjected to high-resolution melt (HRM) analysis, a new technology that allows precise melt profile determination of amplicons. This investigation shows that the CRISPR HRM assay provides a powerful additio...

متن کامل

An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data.

The analysis of next-generation sequencing data is computationally and statistically challenging because of the massive volume of data and imperfect data quality. We present GotCloud, a pipeline for efficiently detecting and genotyping high-quality variants from large-scale sequencing data. GotCloud automates sequence alignment, sample-level quality control, variant calling, filtering of likely...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2014